249 research outputs found
Poly-Sarcosine and Poly(ethylene-glycol) interactions with proteins investigated using molecular dynamics simulations
Nanoparticles coated with hydrophilic polymers often show a reduction in
unspecific interactions with the biological environment, which improves their
biocompatibility. The molecular determinants of this reduction are not very
well understood yet, and their knowledge may help improving nanoparticle
design. Here we address, using molecular dynamics simulations, the interactions
of human serum albumin, the most abundant serum protein, with two promising
hydrophilic polymers used for the coating of therapeutic nanoparticles,
poly(ethylene-glycol) and poly-sarcosine. By simulating the protein immersed in
a polymer-water mixture, we show that the two polymers have a very similar
affinity for the protein surface, both in terms of the amount of polymer
adsorbed and also in terms of the type of amino acids mainly involved in the
interactions. We further analyze the kinetics of adsorption and how it affects
the polymer conformations. Minor differences between the polymers are observed
in the thickness of the adsorption layer, that are related to the different
degree of flexibility of the two molecules. In comparison poly-alanine, an
isomer of poly-sarcosine known to self-aggregate and induce protein
aggregation, shows a significantly larger affinity for the protein surface than
PEG and PSar, which we show to be related not to a different patterns of
interactions with the protein surface, but to the different way the polymer
interacts with water
Flexible domain prediction using mixed effects random forests
This paper promotes the use of random forests as versatile tools for estimating spatially disaggregated indicators in the presence of small area-specific sample sizes. Small area estimators are predominantly conceptualised within the regression-setting and rely on linear mixed models to account for the hierarchical structure of the survey data. In contrast, machine learning methods offer non-linear and non-parametric alternatives, combining excellent predictive performance and a reduced risk of model-misspecification. Mixed effects random forests combine advantages of regression forests with the ability to model hierarchical dependencies. This paper provides a coherent framework based on mixed effects random forests for estimating small area averages and proposes a non-parametric bootstrap estimator for assessing the uncertainty of the estimates. We illustrate advantages of our proposed methodology using Mexican income-data from the state Nuevo LeĂłn. Finally, the methodology is evaluated in model-based and design-based simulations comparing the proposed methodology to traditional regression-based approaches for estimating small area averages
Small Area with Multiply Imputed Survey Data
In this article, we propose a framework for small area estimation with multiply imputed survey data. Many statistical surveys suffer from (a) high nonresponse rates due to sensitive questions and response burden and (b) too small sample sizes to allow for reliable estimates on (unplanned) disaggregated levels due to budget constraints. One way to deal with missing values is to replace them by several plausible/imputed values based on a model. Small area estimation, such as the model by Fay and Herriot, is applied to estimate regionally disaggregated indicators when direct estimates are imprecise. The framework presented tackles simultaneously multiply imputed values and imprecise direct estimates. In particular, we extend the general class of transformed Fay-Herriot models to account for the additional uncertainty from multiple imputation. We derive three special cases of the Fay-Herriot model with particular transformations and provide point and mean squared error estimators. Depending on the case, the mean squared error is estimated by analytic solutions or resampling methods. Comprehensive simulations in a controlled environment show that the proposed methodology leads to reliable and precise results in terms of bias and mean squared error. The methodology is illustrated by a real data example using European wealth data
Modelling the distribution of health related quality of life of advancedmelanoma patients in a longitudinal multi-centre clinical trial using M-quantile random effects regression
Health-related quality of life assessment is important in the clinical
evaluation of patients with metastatic disease that may offer useful
information in understanding the clinical effectiveness of a treatment. To
assess if a set of explicative variables impacts on the health-related quality
of life, regression models are routinely adopted. However, the interest of
researchers may be focussed on modelling other parts (e.g. quantiles) of this
conditional distribution. In this paper, we present an approach based on
quantile and M-quantile regression to achieve this goal. We applied the
methodologies to a prospective, randomized, multi-centre clinical trial. In
order to take into account the hierarchical nature of the data we extended the
M-quantile regression model to a three-level random effects specification and
estimated it by maximum likelihood
Robust small area estimation under spatial non-stationarity
Geographically weighted small area methods have been studied in literature for
small area estimation. Although these approaches are useful for the estimation
of small area means efficiently under strict parametric assumptions, they can
be very sensitive to outliers in the data. In this paper, we propose a robust
extension of the geographically weighted empirical best linear unbiased
predictor (GWEBLUP). In particular, we introduce robust projective and
predictive small area estimators under spatial non-stationarity. Mean squared
error estimation is performed by two different analytic approaches that
account for the spatial structure in the data. The results from the model-
based simulations indicate that the proposed approach may lead to gains in
terms of efficiency. Finally, the methodology is demonstrated in an
illustrative application for estimating the average total cash costs for farms
in Australia
Estimating regional income indicators under transformations and access to limited population auxiliary information
Spatially disaggregated income indicators are typically estimated by using model-based methods that assume access to auxiliary information from population micro-data. In many countries like Germany and the UK population micro-data are not publicly available. In this work we propose small area methodology when only aggregate population-level auxiliary information is available. We use data-driven transformations of the response to satisfy the parametric assumptions of the used models. In the absence of population micro-data, appropriate bias-corrections for small area prediction are needed. Under the approach we propose in this paper, aggregate statistics (means and covariances) and kernel density estimation are used to resolve the issue of not having access to population micro-data. We further explore the estimation of the mean squared error using the parametric bootstrap. Extensive model-based and design-based simulations are used to compare the proposed method to alternative methods. Finally, the proposed methodology is applied to the 2011 Socio-Economic Panel and aggregate census information from the same year to estimate the average income for 96 regional planning regions in Germany
Releasing survey microdata with exact cluster locations and additional privacy safeguards
Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents’ re-identification risk for any number of disclosed attributes by 60–80% even under re-identification attempts
Estimation of Linear and Non-Linear Indicators using Interval Censored Income Data
Among a variety of small area estimation methods, one popular approach for the
estimation of linear and non-linear indicators is the empirical best
predictor. However, parameter estimation using standard maximum likelihood
methods is not possible, when the dependent variable of the underlying nested
error regression model, is censored to specific intervals. This is often the
case for income variables. Therefore, this work proposes an estimation method,
which enables the estimation of the regression parameters of the nested error
regression model using interval censored data. The introduced method is based
on the stochastic expectation maximization algorithm. Since the stochastic
expectation maximization method relies on the Gaussian assumptions of the
error terms, transformations are incorporated into the algorithm to handle
departures from normality. The estimation of the mean squared error of the
empirical best predictors is facilitated by a parametric bootstrap which
captures the additional uncertainty coming from the interval censored
dependent variable. The validity of the proposed method is validated by
extensive model-based simulations
estimating literacy rates in Senegal
Modern systems of official statistics require the accurate and timely
estimation of socio-demographic indicators for disaggregated geographical
regions. Traditional data collection methods such as censuses or household
surveys impose great financial and organizational burdens for National
Statistical Institutes. The rise of new information and communication
technologies offers promising sources to mitigate these shortcomings. In this
paper we propose a unified approach for National Statistical Institutes based
on small area estimation that allows for the estimation of socio-demographic
indicators by using mobile phone data. In particular, the methodology is
applied to mobile phone data from Senegal for deriving sub-national estimates
of the share of illiterates disaggregated by gender. The estimates are used to
identify hot spots of illiterates with a need for additional infrastructure or
policy adjustments. Although the paper focuses on literacy as a particular
socio-demographic indicator, the proposed approach is applicable to indicators
from national statistics in general
- …